v0.12.3: Reliability wave — sync deadlock, search timeout scoping, wikilinks, orphans#216
Merged
v0.12.3: Reliability wave — sync deadlock, search timeout scoping, wikilinks, orphans#216
Conversation
sync.ts wraps the add/modify loop in engine.transaction(), and each importFromContent inside opens another one. PGLite's _runExclusiveTransaction is a non-reentrant mutex — the second call queues on the mutex the first is holding, and the process hangs forever in ep_poll. Reproduced with a 15-file commit: unpatched hangs, patched runs in 3.4s. Fix drops the outer wrap; per-file atomicity is correct anyway (one file's failure should not roll back the others). (cherry picked from commit 4a1ac00)
Reads src/commands/sync.ts verbatim and asserts no uncommented engine.transaction() call appears above the add/modify loop. Protects against silent reintroduction of the nested-mutex deadlock that hung > 10-file syncs forever in ep_poll.
parseEmbedding() throws on structural corruption — right call for ingest/ migrate paths where silent skips would be data loss. Wrong call for search/rescore paths where one corrupt row in 10K would kill every query that touches it. tryParseEmbedding() wraps parseEmbedding in try/catch: returns null on any shape that would throw, warns once per session so the bad row is visible in logs. Use it anywhere we'd rather degrade ranking than blow up the whole query. Retrofit postgres-engine.getEmbeddingsByChunkIds (the #175 slice call site) — the 5-line rescore loop was the direct motivator. Keep the throwing parseEmbedding() for everything else (pglite-engine rowToChunk, migrate-engine round-trips, ingest).
searchKeyword and searchVector run on a pooled postgres.js client
(max: 10 by default). The original code bounded each search with
await sql`SET statement_timeout = '8s'`
try { await sql`<query>` }
finally { await sql`SET statement_timeout = '0'` }
but every tagged template is an independent round-trip that picks an
arbitrary connection from the pool. The SET, the query, and the reset
could all land on DIFFERENT connections. In practice the GUC sticks
to whichever connection ran the SET and then gets returned to the
pool — the next unrelated caller on that connection inherits the 8s
timeout (clipping legitimate long queries) or the reset-to-0 (disabling
the guard for whoever expected it). A crash in the middle leaves the
state set permanently.
Wrap each search in sql.begin(async sql => …). postgres.js reserves
a single connection for the transaction body, so the SET LOCAL, the
query, and the implicit COMMIT all run on the same connection. SET
LOCAL scopes the GUC to the transaction — COMMIT or ROLLBACK restores
the previous value automatically, regardless of the code path out.
Error paths can no longer leak the GUC.
No API change. Timeout value and semantics are identical (8s cap on
search queries, no effect on embed --all / bulk import which runs
outside these methods). Only one transaction per search — BEGIN +
COMMIT round-trips are negligible next to a ranked FTS or pgvector
query.
Also closes the earlier audit finding R4-F002 which reported the same
pattern on searchKeyword. This PR covers both searchKeyword and
searchVector so the pool-leak class is fully closed.
Tests (test/postgres-engine.test.ts, new file):
- No bare SET statement_timeout remains after stripping comments.
- searchKeyword and searchVector each wrap their query in sql.begin.
- Both use SET LOCAL.
- Neither explicitly clears the timeout with SET statement_timeout=0.
Source-level guardrails keep the fast unit suite DB-free. Live
Postgres coverage of the search path is in test/e2e/search-quality.test.ts,
which continues to exercise these methods end-to-end against
pgvector when DATABASE_URL is set.
(cherry picked from commit 6146c3b)
… pages Surfaces pages with zero inbound wikilinks. Essential for content enrichment cycles in KBs with 1000+ pages. By default filters out auto-generated pages, raw sources, and pseudo-pages where no inbound links is expected; --include-pseudo to disable. Supports text (grouped by domain), --json, --count outputs. Also exposed as find_orphans MCP operation. Tests cover basic detection, filtering, all output modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit f50954f)
…n canonical extractor extractEntityRefs now recognizes both syntaxes equally: [Name](people/slug) -- upstream original [[people/slug|Name]] -- Obsidian wikilink (new) Extends DIR_PATTERN to include domain-organized wiki slugs used by Karpathy-style knowledge bases: - entities (legacy prefix some brains keep during migration) - projects (gbrain canonical, was missing from regex) - tech, finance, personal, openclaw (domain-organized wiki roots) Before this change, a 2,100-page brain with wikilinks throughout extracted zero auto-links on put_page because the regex only matched markdown-style [name](path). After: 1,377 new typed edges on a single extract --source db pass over the same corpus. Matches the behavior of the extract.ts filesystem walker (which already handled wikilinks as of the wiki-markdown-compat fix wave), so the db and fs sources now produce the same link graph from the same content. Both patterns share the DIR_PATTERN constant so adding a new entity dir only requires updating one string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 1cfb156)
Add two v0.12.1-era reliability checks to `gbrain doctor`: - `jsonb_integrity` scans the 4 known write sites from the v0.12.0 double-encode bug (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata) and reports rows where jsonb_typeof(col) = 'string'. The fix hint points at `gbrain repair-jsonb` (the standalone repair command shipped in v0.12.1). - `markdown_body_completeness` flags pages whose compiled_truth is <30% of the raw source content length when raw has multiple H2/H3 boundaries. Heuristic only; suggests `gbrain sync --force` or `gbrain import --force <slug>`. Also adds test/e2e/jsonb-roundtrip.test.ts — the regression coverage that should have caught the original double-encode bug. Hits all four write sites against real Postgres and asserts jsonb_typeof='object' plus `->>'key'` returns the expected scalar. Detection only: doctor diagnoses, `gbrain repair-jsonb` treats. No overlap with the standalone repair path.
Master shipped v0.12.1 (extract N+1 + migration timeout) and v0.12.2 (JSONB double-encode + splitBody + wiki types + parseEmbedding) while this wave was mid-flight. Ships the remaining pieces as v0.12.3: - sync deadlock (#132, @sunnnybala) - statement_timeout scoping (#158, @garagon) - Obsidian wikilinks + domain patterns (#187 slice, @knee5) - gbrain orphans command (#187 slice, @knee5) - tryParseEmbedding() availability helper - doctor detection for jsonb_integrity + markdown_body_completeness No schema, no migration, no data touch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md: - Add src/commands/orphans.ts entry - Expand src/commands/doctor.ts with v0.12.3 jsonb_integrity + markdown_body_completeness check descriptions - Update src/core/link-extraction.ts to mention Obsidian wikilinks + extended DIR_PATTERN (entities/projects/tech/finance/personal/openclaw) - Update src/core/utils.ts to mention tryParseEmbedding sibling - Update src/core/postgres-engine.ts to note statement_timeout scoping + tryParseEmbedding usage in getEmbeddingsByChunkIds - Add Key commands added in v0.12.3 section (orphans, doctor checks) - Add test/orphans.test.ts, test/postgres-engine.test.ts, updated descriptions for test/sync.test.ts, test/doctor.test.ts, test/utils.test.ts - Add test/e2e/jsonb-roundtrip.test.ts with note on intentional overlap - Bump operation count from ~36 to ~41 (find_orphans shipped in v0.12.3) README.md: - Add gbrain orphans to ADMIN commands block Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9 tasks
joedanz
added a commit
to joedanz/pbrain
that referenced
this pull request
Apr 19, 2026
Pulls forward the v0.12.3 reliability fixes — sync deadlock, search-timeout scoping, tryParseEmbedding for search corruption tolerance, pbrain orphans command + find_orphans MCP op, and two new doctor checks (jsonb_integrity, markdown_body_completeness) that point at the repair-jsonb / sync --force remediation. What changed: - src/commands/sync.ts: dropped the outer engine.transaction() wrap so importFromContent's per-file transaction isn't nested. PGLite's _runExclusiveTransaction is non-reentrant, so the inner call used to park on the mutex forever once ~10 files hit the pipeline. - src/core/postgres-engine.ts: searchKeyword / searchVector now scope statement_timeout via sql.begin + SET LOCAL so the GUC can't leak onto a pooled connection and clip unrelated long-running queries. - src/core/utils.ts: new tryParseEmbedding() (returns null + warns once per process on bad input) for search/rescore paths where availability matters more than strictness. parseEmbedding() stays strict for migration/ingest paths. - src/commands/orphans.ts: new. Domain-grouped report of pages with zero inbound wikilinks; --include-pseudo flag, --json, --count. Also wired as find_orphans MCP operation. - src/commands/doctor.ts: +2 reliability checks. jsonb_integrity scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for jsonb_typeof='string' rows (v0.12.0 residue); markdown_body_completeness flags pages with compiled_truth <30% of raw source when raw has multiple H2/H3 boundaries. New tests: test/orphans.test.ts, test/postgres-engine.test.ts, test/e2e/jsonb-roundtrip.test.ts. doctor.ts and sync.ts existing tests updated with new check/deadlock assertions. Upstream's src/core/link-extraction.ts (knowledge-graph layer) is NOT taken — it's Wave-2 material. All integrated code paths operate on the already-existing links/timeline_entries schema. (cherry picked from commit 013b348) Co-Authored-By: Garry Tan <garry@garrytan.com>
7 tasks
Open
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reliability wave follow-up to v0.12.1 (extract N+1 + migration timeout) and v0.12.2 (JSONB double-encode + splitBody +
/wiki/types + parseEmbedding). Lands the remaining community-sourced fixes from the same review pass, plus one graph-layer feature (gbrain orphans) that closes the loop on the v0.12 knowledge graph story.No schema changes. No migration. No data touch.
gbrain upgradepulls it.What ships
src/commands/sync.tswrapped the whole import inengine.transaction, andimportFromContentalso wrapped per-file. PGLite's non-reentrant mutex deadlocked. Outer wrap removed; per-file atomicity preserved. Regression test asserts top-levelengine.transactionis not called.statement_timeoutscoped to the transaction (postgres-engine: scope search statement_timeout to the transaction #158, @garagon) —searchKeyword/searchVectorusedSET statement_timeout='8s'+finally SET 0, but each tagged template picks an arbitrary pool connection. The cap leaked across the postgres.js pool and strangled unrelated queries. Now usessql.begin+SET LOCAL. 5 regression tests including a source-level grep guardrail.[[WikiLinks]]as first-class edges (fix: markdown parsing bugs affecting wiki-style content #187 slice, @knee5) —extractEntityRefsmatches both[Name](people/slug)and[[people/slug|Name]].DIR_PATTERNextended toentities,projects,tech,finance,personal,openclaw. Before: a 2,100-page brain extracted zero auto-links onput_page. After: 1,377 typed edges on a single pass.gbrain orphanscommand (fix: markdown parsing bugs affecting wiki-style content #187 slice, @knee5) — surfaces pages with zero inbound wikilinks. Text/JSON/count outputs, domain grouping,--include-pseudoflag. Also exposed asfind_orphansMCP operation.tryParseEmbedding()availability helper — new sibling ofparseEmbedding()that returnsnull+ warns once instead of throwing. Used ongetEmbeddingsByChunkIdsso one corrupt Supabase row degrades ranking instead of killing the query. Migration/ingest paths still throw.jsonb_integrityscans the four JSONB write sites and reportsjsonb_typeof='string'rows;markdown_body_completenessheuristically flags truncated-body pages. Fix hints point atgbrain repair-jsonb(shipped in v0.12.2) andgbrain sync --force.Credits
tryParseEmbeddingsplit-by-call-site decisionAll commits preserve
Co-Authored-Bytrailers viagit cherry-pick -x.Test plan
bun test(unit only) — 1340 pass / 0 failbun run test:e2eon fresh pgvector/pgvector:pg16 container — 129 pass / 5 skip / 0 fail🤖 Generated with Claude Code